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Abstract 

In this paper we derive tight bounds on the expected value of products of low influence func- 
tions defined on correlated probability spaces. The proofs are based on extending Fourier theory 
to an arbitrary number of correlated probability spaces, on a generalization of an invariance prin- 
ciple recently obtained with O'Donnell and Oleszkiewicz for multilinear polynomials with low 
influences and bounded degree and on properties of multi-dimensional Gaussian distributions. 

We present two applications of the new bounds to the theory of social choice. We show 
that Majority is asymptotically the most predictable function among all low influence functions 
given a random sample of the voters. Moreover, we derive an almost tight bound in the context 
of Condorcet aggregation and low influence voting schemes on a large number of candidates. 
In particular, we show that for every low influence aggregation function, the probability that 
Condorcet voting on k candidates will result in a unique candidate that is preferable to all 
others is k~ 1+ °'-- 1 \ This matches the asymptotic behavior of the majority function for which the 
probability is fc _1_o(1) . 

A number of applications in hardness of approximation in theoretical computer science were 
obtained using the results derived here in subsequent work by Raghavendra and by Austrin and 
Mossel. 

A different type of applications involves hyper-graphs and arithmetic relations in product 
spaces. For example, we show that if A C is of low influences, then the number of fc 
tuples (a;i, . . . , Xk) € A k satisfying J2i=i x i e m °d m where B C [m] satisfies \B\ > 2 is 
(1 ± o(l))P[^4]' £ (m fc_1 |i3|) n which is the same as if A were a random set of probability P[A]. 
Our results also show that for a general set A without any restriction on the influences there 
exists a set of coordinates S C [n] with |5| = 0(1) such that if C = {x : By € A,y\ n ns — 
x [n]\s} then the number of fc-tuples (x±, . . . ,Xk) € C k satisfying Y2%=i x i ^ B n m °d m is 
(1 ±o{l))P[C} k {m k ~ 1 \B\) n . 
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by ONR award N00014-07-1-0506. Part of this work was carried out while the author was visiting IPAM, UCLA 
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1 Introduction 



1.1 Harmonic analysis of boolean functions 

This paper studies low influence functions / : f2 n — > [0, 1], where (f2 n ,/x n ) is a product probability 
space and where the influence of the ith coordinate on /, denoted by Infj(/) is defined by 

Inf i (f)=-E[Var[f(X 1 ,...,X n )\X j ,l<j<n,j^i]}, (1) 

where for any set S C [n] the conditional variance Var[/(Xi, . . . , X n )\Xi,i G S] is defined via: 

Var[f(X 1 ,...,X n )\X i ,i£ S] = B[(f(X 1 ,...,X n )-E[f(X 1 ,...,X n )\X i ,i G S]) 2 \X h ie S . 

The study of low influence functions is motivated by applications from the theory of social choice in 
mathematical economics, by applications in the theory of hardness of approximation in theoretical 
computer science and by problems in additive number theory. We refer the reader to some recent 
papers [18] [T9] [2T] [22], [7J [30], [T2] for motivation and general background. The main theorems 
established here provide tight bounds on the expected value of the product of functions defined on 
correlated probability spaces. These in turn imply some new results in the theory of social choice 
and in the theory of hyper-graphs. Application to hardness of approximation in computer science 
were derived in subsequent work in [I] and [26j . 

In our main result we consider a probability measure P defined on a space [jLi Let- 
ting fi : (Q^) ra — * [0,1], 1 < i < A; be a collection of low influence functions we derive tight 
bounds on E[/i . . . /&] in terms of E[/i], . . . , E[/^] and a measure of correlation between the spaces 
ftW The bounds are expressed in terms of extremal probabilities in Gaussian space, that 

can be calculated in the case k = 2. When k > 2 and P is a pairwise independent distribution our 
bounds show that E[/i . . . /&] is close to J3^ =1 E[/i]. We also apply a simple recursive argument 
in order to obtain results for general functions not necessarily of low influences. The results show 
that the bounds for low influence functions hold for general functions after the functions have been 
"modified" in a bounded number of coordinates. The rest of the introduction is devoted to various 
applications followed by statements of the main technical results. 



1.2 Prediction Of Low Influence Voting 

Suppose n voters are to make a binary decision. Assume that the outcome of the vote is determined 
by a social choice function / : {—1, l} n — ► { — 1, 1}, so that the outcome of the vote is f(x\, . . . , x n ) 
where X{ G {—1, 1} is the vote of voter i. We assume that the votes are independent, each ±1 with 
probability |- It is natural to assume that the function / satisfies f(—x) = —f(x), i.e., it does not 
discriminate between the two candidates. Note that this implies that E[/] = under the uniform 
distribution. A natural way to try and predict the outcome of the vote is to sample a subset of 
the voters, by sampling each voter independently with probability p. Conditioned on a vector X 
of votes the distribution of Y, the sampled votes, is i.i.d. where Yi = Xj with probability p and 
Yi = * (for unknown) otherwise. 

Conditioned on Y = y, the vector of sampled votes, the optimal prediction of the outcome of 
the vote is given by sgn((T/)(y)) where 

(Tf)(y)=E[f(X)\Y = y]. (2) 

This implies that the probability of correct prediction (also called predictability) is given by 

P[/ = sgn(T/)] = i(l + E[/ B gn(r/)]). 
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For example, when f(x) = x\ is the dictator function, we have E[/sgn(T/)] = p corresponding to 
the trivial fact that the outcome of the election is known when voter 1 is sampled and are ±1 with 
probability 1/2 otherwise. The notion of predictability is natural in statistical contexts. It was also 
studied in a more combinatorial context in [24] . 

In the first application presented here we show that 

Theorem 1.1 ("Majority Is Most Predictable") Let < p < 1 and e > be given. Then 
there exists a r > such that if f : {—1, l} n — ► [—1, 1] satisfies E[/] = and Infj(/) < r for all i, 
then 

E[/ sgn(T/)] < I arcsin + e, (3) 

where T is defined in 

Moreover, it follows from the central limit theorem (see Section [73] a version of this calculation 
also appears in [24]) that if Maj n (xi, . . . , x n ) = sgn(^™ =1 X{), then 

lim E[Maj n sgn(TMaj„)l = ~ arcsin J~p. 

Remark 1.2 Note that Theorem \l.l\ proves a weaker statement than showing that Majority is the 
most predictable function. The statement only asserts that if a function has low enough influences 
than its predictability cannot be more than e larger than the asymptotic predictability value achieved 
by the majority function when the number of voters n — > do. This slightly inaccurate title of the 
theorem is inline with previous results such as the "Majority is Stablest Theorem" (see below). 
Similar language may be used later when informally discussing statements of various theorems. 

Remark 1.3 One may wonder if for a finite n, among all functions f : {—1, l} n — ► {—1, 1} with 
E[/] = 0, majority is the most predictable function. Note that the predictability of the dictator 
function f(x) = x\ is given by p, and ^ arcsin ^fp > p for p — » 0. Therefore when p is small and 
n is large the majority function is more predictable than the dictator function. However, note that 
when p —> 1 we have p > — arcsin *J~p and therefore for values of p close to 1 and large n the dictator 
function is more predictable than the majority function. 

We note that the bound obtained in Theorem 11.11 is a reminiscent of the Majority is Stablest 
theorem [21} 122] as both involve the arcsin function. However, the two theorems are quite different. 
The Majority is Stablest theorem asserts that under the same condition as in Theorem II. II it holds 
that 

E[f(X)f(Y)] < I arcsin p + e. 

where {X U Y$ G {-1,1} 2 are i.i.d. with EpQ] = E[Yj] = and E[X^] = p. Thus "Majority is 
Stablest" considers two correlated voting vectors, while "Majority is Most Predictable" considers a 
sample of one voting vector. In fact, both results follow from the more general invariance principle 
presented here. We note a further difference between stability and predictability: It is well known 
that in the context of "Majority is Stablest", for all < p < 1, among all boolean functions with 
E[/] = the maximum of E[f(x)f(y)] is obtained for dictator functions of the form f(x) = X{. 
As discussed above, for p close to and large n, the dictator is less predictable than the majority 
function. 

We also note that the "Ain't over until it's over" Theorem |21|, [22] provides a bound under the 
same conditions on 

P[Tf>l-5], 
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for small 5. However, this bound is not tight and does not imply Theorem 11.11 Similarly, Theo- 
rem [1J] does not imply the "Ain't over until it's over" theorem. The bounds in "Ain't Over Until 
It's Over" were derived using invariance of Tf while the bound © requires the joint invariance of 
/ and Tf. 

1.3 Condorcet Paradoxes 

Suppose n voters rank k candidates. It is assumed that each voter i has a linear order crj £ S(k) 
on the candidates. In Condorcet voting, the rankings are aggregated by deciding for each pair of 
candidates which one is superior among the n voters. 

More formally, the aggregation results in a tournament G/% on the set [A;]. Recall that is a 
tournament on [k] if it is a directed graph on the vertex set [k] such that for all a,b £ [k] either 
(a > b) £ Gk or (6 > a) £ G^. Given individual rankings (o~i)f =1 the tournament is defined as 
follows. 

Let x a>b {i) = 1, if a t (a) > a t (b), and x a>b {i) = -1 if a^a) < a^b). Note that x b>a = -x a>b . 

The binary decision between each pair of candidates is performed via a anti-symmetric function 
/ : {-1, 1}" -> {-1, 1} so that f(-x) = -f(x) for all x £ {-1, 1}. The tournament G k = G k (a; f) 
is then defined by letting (a > b) £ G k if and only if f(x a>b ) = 1. 

Note that there are 2( 2 )tournaments while there are only kl = 2 ( fclogfc ) linear rankings. For 
the purposes of social choice, some tournaments make more sense than others. 

Definition 1.4 We say that a tournament G k is linear if it is acyclic. We will write Acyc(Gfc) 
for the logical statement that G k is acyclic. Non-linear tournaments are often referred to as non- 
rational in economics as they represent an order where there are 3 candidates a, b and c such that 
a is preferred to b, b is preferred to c and c is preferred to a. 

We say that the tournament Gk is a unique max tournament if there is a candidate a £ [k] such 
that for all b ^ a it holds that (a > b) £ G k . We write UniqMax(G / t) for the logical statement that 
Gk has a unique max. Note that the unique max property is weaker than linearity. It corresponds 
to the fact that there is a candidate that dominates all other candidates. 

Following |161 115j. we consider the probability distribution over n voters, where the voters have 
independent preferences and each one chooses a ranking uniformly at random among all k\ order- 
ings. Note that the marginal distributions on vectors x a>b is the uniform distribution over {—1, l} n 
and that if / : {—1, l} n — * {—1, 1} is anti-symmetric then E[/] = 0. 

The case that is now understood is k = 3. Note that in this case G3 is unique max if and 
only if it is linear. Kalai [15] studied the probability of a rational outcome given that the n vot- 
ers vote independently and at random from the 6 possible rational rankings. He showed that the 
probability of a rational outcome in this case may be expressed as |(1 + E[/T/]) where T is the 
Bonami-Beckner operator with parameter p = 1 /3. The Bonami-Beckner operator may be defined 
as follows. Let {X it F;) £ {-1, l} 2 be i.i.d. with E[X;] = E|Y;] = and EfA^Y;] = p for 1 < i < n. 
For / : { — 1, l} n — > M, and x £ {—1, l} n , the Bonami-Beckner operator T applied to / is defined 
via (T/)(xi, ...,x n ) = E[/(Yi, . . . , Y n )\X\ =xi,...,X n = x n \. 

It is natural to ask which function / with small influences is most likely to produce a rational 
outcome. Instead of considering small influences, Kalai considered the essentially stronger assump- 
tion that / is monotone and "transitive-symmetric"; i.e., that for all 1 < i < j < n there exists a 
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permutation a on [n] with a(i) = j such that f(x±, . . . , x n ) = f(x a n\, . . . , x c r n \) for all (x%, . . . , x n ). 
Kalai conjectured that as n —> oo, the maximum of |(1 + E[/T/]) among all transitive-symmetric 
functions approaches the same limit as lim n ^ oc |(1 + E[Maj n TMaj n ]). This was proven using the 
Majority is Stablest Theorem |21| I22]. Here we obtain similar results for any value of k. Our result 
is not tight, but almost tight. More specifically we show that: 

Theorem 1.5 ("Majority is best for Condorcet") Consider Condorcet voting on k candidates 
Then for all e > there exists r = r(k, e) > such that if f : { — 1, l} n — > { — 1, 1} is anti- symmetric 
and Infj(/) < r for all i, then 

P[UniqMax(G fe (<r; /))] < jfe- 1 +°*( 1 ) + e. (4) 

Moreover for f = Maj n we have Infj(/) = 0(n -1 / 2 ) and it holds that 

P[UniqMax(G fc (a; /))] > AT 1-0 *^ - o n (l). (5) 

Interestingly, we are not able to derive similar results for Acyc. We do calculate the probability 
that Acyc holds for majority. 

Proposition 1.6 We have 

lim P[Acyc(G fc (a; MajJ)] = exp(-9(fe 5 / 3 )). (6) 

n— >oo 

We note that results in economics [I] have shown that for majority vote the probability that the 
outcome will contain a Hamiltonian cycle when the number of voters goes to infinity is 1 — 0^(1). 

1.4 Hyper Graph and Additive Applications 

Here we discuss some applications concerning hyper-graph problems. We let be a finite set 
equipped with the uniform probability measure denoted P. We let R C f2 fc denote a A;- wise relation. 
For sets A\, . . . , C Sl n we will be interested in the number of A:-tuples x 1 £ A\, . . . ,x k G A^ 
satisfying the relation R in all coordinates, i.e., (xj, . . . ,xf) G R for all i. Assume below that R 
satisfies the following two properties: 

• For all a e O and all 1 < j < k it holds that P[x* = a^x 1 , ...,x k ) G R(x)] = (This 
assumption is actually not needed for the general statement - we state it for simplicity only). 

• The relation R is connected. This means that for all x, y £ R there exists a path x = 
y(0), J/(1), • • • , y(r) = y in R such that y(i) and y{i + 1) differ in one coordinate only. 

We will say that the relation R C £l k is pairwise smooth if for all i,j G [k] and a, b G O it holds that 
= a ,x j =b\(x\...,x k ) G R] = P[x i = a\(x\...,x k ) G R]P[x j = b\(x\ . . . , x k ) G R] 

As a concrete example, consider the case where £1 = Z m and R consists of all /c-tuples satisfying 
Y2i=i Xi £ B mod m where B C Z m . When k > 2 we have P[xj = a\R] = mT 1 for all i and a. 
When k > 3, we have pairwise smoothness. The connectivity condition holds whenever \B\ > 1. 

For a set j4 C ZJ^ and S 1 C [n] we define 

^5 = {y ■ 3x G A,X[ n ]\ s = y[ n ]\s},As = {y : Vx s.t. x[ n ]\ s = y[ n ]\s, it holds that x £ A} 
Our main result in the context of hyper graphs is the following. 
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Theorem 1.7 Let R be a connected relation on Q . Then there exist two continuous functions 
r : (0, l) k — ► (0, 1) and T : (0, l) fc — ► (0, 1) suc/t i/iaf for every e > i/iere exisfe a r > suc/t i/iat 
if Ai, . . . , Ah C f^ ra are se£s u>i£/i Infj(A^) < r for all i and j then 

(T(P[M . . . ,P[A k }) - e)P[R n ] <P[R n n(A 1 ,...,A k )}< (T(P[Ai], . . . ,P[A fc ]) + e)P[iT]. 

// is pairwise smooth, then: 

\P[R n n(Ai,..., A k )} - P[i2 n ]P[Ai] . . . PL4 fc ]| < e P[i?"]. 

Moreover, one can take r = (°( mk+1 ^(V^A) _ 

For general sets Ai,..., Ak, not necessarily of low influences, there exists a set S of coordinates 
such that \S\ < 0(l/r) and the statements above hold for Af, . . . , -Af and for A l , . . . , A k . 

1.5 Correlated Spaces 

A central concept that is extensively studied and repeatedly used in the paper is that of correlated 
probability spaces. The notion of correlation between two probability spaces use here is the same 
as the " maximum correlation coefficient" introduced by Hirschfeld and Gebelein [TT] . We will later 
show how to relate correlated spaces to noise operators. 

Definition 1.8 Given a probability measure P defined on Yii=x > we sa V 

are 

correlated spaces. For A C fi® we let 

k 

P[A) = P[(o;i, . . . ,u k ) € J] SI® -.LOieA], 

3=1 

and similarly E[/] for f : 0,^ — > R. We will abuse notation by writing P[A] for P n [A] for 
A C (Ui=i ^ {i) T or Ac {n^) n and similarly for E. 

Definition 1.9 Given two linear subspaces A and B of L 2 (P) we define the correlation between A 
and B by 

p(A,B;P) = p(A,B) = sup{Cov[/,<?] : / G A,g G B, Var[/] = Var[ 5 ] = 1}. (7) 

Let n = (fiW x fi( 2 ),P). We define the correlation ft (2) ; P) by letting: 

p(nW,nW;P) = / o(L 2 (^ 1 ),P),L 2 (^ 2 ),P);P). (8) 

More generally, let Q = (]li=i^ (i) )P) and f or a subset S C [k], write tl^ = l[ ieS £l®. The 
correlation vector p(fl^\ . . . , fl^; P) is a length k — 1 vector whose i 'th coordinate is given by 

i k 
j = l j=i+l 

for 1 < i < k — 1. The correlation p(fl^\ . . . , P) is defined by letting: 

i-l k 

p(OW,...,n (t:) ;P) = max p(TT x T7fiW,fi«;P). (9) 

l<i<k 

j=l j'=J+l 
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When the probability measure P will be clear from the context we will write p(QW,...,nW) 

for P (n<u,...,nW;P) etc. 

Remark 1.10 It is easy to see that p(^\ £1^; P) is the second singular value of the conditional 
expectation operator mapping f £ L 2 (Q,( 2 \~P) to g(x) = E[f(Y)\X = x] € L 2 (Q^ ,~P). Thus 
p^W, rj( 2 ^; P) is the second singular value of the matrix corresponding to the operator T with 
respect to orthonormal basis of L 2 {VL^-\ P) and L 2 (0,( 2 \~P). 

Definition 1.11 Given fllJU P), we say are r-wise independent if for all 

S C [k] with \S\ < r and for all Y\ ieS Ai C flies ^ holds that 

p[n^]=n p ^i- 

ieS ieS 

The notion of r-wise independence is central in computer science and discrete mathematics, in 
particular in the context of randomized algorithms and computational complexity. 

1.6 Gaussian Stability 

Our main result states bounds in terms of Gaussian stability measures which we discuss next. Let 
7 be the one dimensional Gaussian measure. 

Definition 1.12 Given p £ [0,1], define '■ ^ ~~ * {0, 1} to be the indicator function of the 
interval (— oo,t], where t is chosen so that E 7 [x M ] = p- Explicitly, t = <&~ l (p), where $ denotes 
the distribution function of a standard Gaussian. Furthermore, define 

T p (p, v) = PLY < <S>~\p) , Y < V)], 

T p (p, u) = PLY < <S>~\p) , Y > - u)\, 

where (X, Y) is a two dimensional Gaussian vector with covariance matrix 

Given (pi, . . . , Pk-i) € [0, and (pi, . . . , p^) G [0, l} k for k > 3 we define by induction 

^ Pu ..., Pk ^ 1 (pi, . . . , p k ) = r Pl {pi, r j02v .. j/3fc _ 1 (p 2 , . . . , p fe )), 

anc? similarly £(). 

1.7 Statements of main results 

We now state our main results. We state the results both for low influence functions and for general 
functions. For the later it is useful to define the following notions: 

Definition 1.13 Let f : VL n -> R and S C [n]. We define 

J S (x) = sup(/(y) : y [n] \ s = x [n] \ S ), f(x) = inf(/(y) : y [n] \ s = x [n] \ s ). 

Theorem 1.14 Let dljLi ^'\Pi)> I < i < n be a sequence of finite probability spaces such that for 
all 1 < i < n the minimum probability of any atom in Ylj=i ^ s a ^ least a. Assume furthermore 
that there exists p 6 [0, and < p < 1 such that 

p(nf\...,nf ) -,p i )<p, 
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p(n ( { W] fi jfi + W}) i p. ) < g(jl (10) 
for all Then for all e > there exists r > suc/i i/iai i/ 



i=l 

for 1 < j < k satisfy 

maxClnfiCfr)) < r (11) 

then 

k 

r £ (E[/!], . . . ,E[/ fc ]) - 6 < EfQ /,] < r £ (E[/!], . . . ,E[/ fc ]) + 6. (12) 

J'=l 

7/ we instead of U0\) we assume that 

p(n? ) ,np" ) ;Pi) = o, (is) 

/or a// i,i 7^ j' #ien 

fc k k 

II E[/,] - e < E[I] /,-] < [] E[/i] + e. (14) 
j=l j=i j=i 



One may to/ce 



n , log(l/ £ )log(l/ Q K 



A truncation argument allows one to relax the conditions on the influences. 
Proposition 1.15 For statement to hold in the case where k = 2 it suffices to require that 

max(min(Infi(/i), Infi(/ 2 ))) < r (15) 

i 

instead of [77]) , 

In the case where for each i the spaces , . . . , are s-wise independent, for statement |7^p 
to hold it suffices to require that for all i 

\{j : InU(fj) > t}\ < s. (16) 

An easy recursive argument allows one to conclude the following result that does not require 
low influences (jlip . 

Proposition 1.16 Consider the setting of Theorem \1.14\ without the assumptions on low influ- 
ences [77]) . 

Assuming ITTjj) . there exists a set S of size 0(l/r) such that the functions / ■ satisfy 

k 

EfQ/f] > r p (E[/f], . . . ,E[/f]) - 6 > r £ (E[/x], . . . ,E[/ fc ]) - c , 

and iae functions f . satisfy 



k 

ElIIZ? ^ rp(E[/f], • • ■ ,E[/£]) - 6 < r £ (E[/!] ; . . .,£[/,]) - e . 
i=i 
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Assuming U3\) . we have 

k k k 

E [ii7f]>n E ^- e >n E [^- e ' 

j=i j=i j=i 

and similarly for f. 

1.8 Road Map 

Let us review some of the main techniques we use in this paper. 

• We develop a Fourier theory on correlated spaces in Section [2j Previous work considered 
Fourier theory on one product space and reversible operators with respect to that space [7J. 
Our results here allow to study non-reversible operators which in turn allows us to study 
products of k correlated spaces. An important fact we prove that is used repeatedly is that 
general noise operators respect "Efron-Stein" decomposition. This fact in particular allows 
us to "truncate" functions to their low degree parts when considering the expected value of 
the product of functions on correlated spaces. 

• In order to derive an invariance principle we need to extend the approach of [291 I2T1 [22] to 
prove the joint invariance of a number of multi-linear polynomials. The proof of the extension 
appears in sections [3] and 01 The proof follows the same main steps as in |29 [ 121 } 122]. i.e., the 
Lindeberg strategy for proving the CLT [20J where invariance is established by switching one 
variable at a time. 

• In the Gaussian realm, we need to extend BorelPs isoperimetric result [6] both in the case of 
two collections of Gaussians and in the case of k > 2 collections. This is done in Section 

• The proof of the main result, Theorem 11.141 follows in Section [6J The proof of the extensions 
given in Proposition 11.151 uses a truncation argument for which s-wise independence plays a 
crucial role. The proof of Proposition II . 161 is based on a simple recursive argument. 

• In Section [7J we apply the noise bounds in order to derive the social choice results. Some 
calculations with the majority function in the social choice setting, in particular showing the 
tightness of theorems 11.11 and 11.51 are given in Section 17.51 We conclude by discussing the 
applications to hyper-graphs in Section [HJ 

1.9 Subsequent Work And Applications in Computer Science 

Subsequently to posting a draft of this paper on the Arxiv, two applications of our results to hard- 
ness of approximation in computer science have been established. Both results are in the context of 
the Unique Games conjecture in computational complexity [T7J. Furthermore, both results consider 
an important problem in computer science, that is - the problem of solving constraint satisfaction 
problems (CSP). 

Given a predicate P : [q] k — ► {0,1}, where [q] = {l,...,g} for some integer q, we define 
Max CSP(P) to be the algorithmic problem where we are given a set of variables x±, . . . , x n taking 
values in [q] and a set of constraints of the form P(h, . . . , Ik), where each ij = Xj + a, where Xj is 
one of the variables and a £ [q] is a constant (addition is mod q). More generally, in the problem 
of Max A;-CSPq we are given a set of constraints each involving k of the variables x\, . . . , x n . The 
most well studied case is the case of q = 2 denoted Max A;-CSP. 
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The objective is to find an assignment to the variables satisfying as many of the constraints as 
possible. The problem of Max /c-CSPg is NP-hard for any k > 2,q > 2, and as a consequence, 
a large body of research is devoted to studying how well the problem can be approximated. We 
say that a (randomized) algorithm has approximation ratio a if, for all instances, the algorithm is 
guaranteed to find an assignment which (in expectation) satisfies at least a ■ Opt of the constraints, 
where Opt is the maximum number of simultaneously satisfied constraints, over any assignment. 

The results of pQ (see also [2]) show that assuming the Unique Games Conjecture, for any pred- 
icate P for which there exists a pairwise independent distribution over [q] k with uniform marignals, 
whose support is contained in P _1 (l), is approximation resilient. In other words, there is no poly- 
nomial time algorithm which achieves a better approximation factor than assigning the variables at 
random. This result implies in turn that for general k > 3 and q > 2, the MAX fc-CSP^ problem is 
UG-hard to approximate within 0(kq 2 )/q k + e. Moreover, for the special case of q = 2, i.e., boolean 
variables, it gives hardness of (k + O(k ' 525 ))/2 k + e, improving upon the best previous bound [30] 
of 2k /2 k + e by essentially a factor 2. Finally, again for q = 2, assuming that the famous Hadamard 
Conjecture is true, the results are further improved, and the 0(fc ' 525 ) term can be replaced by the 
constant 4. 

These results should be compared to prior work by Samordnitsky and Trevisan [30] who using 
the Gowers norm, proved that the Max fc-CSP problem has a hardness factor of 2l" log2 k + 1 ~\/2 k , 
which is (k + l)/2 fc for k = 2 r - 1, but can be as large as 2k/2 k for general k. 

From the quantitative point of view |2j gives stronger stronger hardness than [30] for Max k-CSP q , 
even in the already thoroughly explored q = 2 case. These improvements may seem very small, 
being an improvement only by a multiplicative factor 2. However, it is well known that it is impos- 
sible to get non-approximability results which are better than (k + l)/2 k , and thus, in this respect, 
the hardness of (k + 4)/2 fc assuming the Hadamard Conjecture is in fact optimal to within a very 
small additive factor. Also, the results of [2j give approximation resistance of Max CSP(P) for a 
much larger variety of predicates (any P containing a balanced pairwise independent distribution). 

From a qualitative point of view, the analysis of [2] is very direct. Furthermore, it is general 
enough to accommodate any domain [q] with virtually no extra effort. Also, their proof using the 
main result of the current paper, i.e., bounds on expectations of products under certain types of 
correlation, putting it in the same general framework as many other UGC-based hardness results, 
in particular those for 2-CSPs. 

In a second beautiful result by Raghavendra [26J the results of the current paper were used to 
obtain very general hardness results for Max CSP(P). In [26] it is shown that for every predicate 
P and for every approximation factor which is smaller than the UG-hardness of the problem, there 
exists a polynomial time algorithm which achieves this approximation ratio. Thus for every P the 
UG-hardness of Max CSP(P) is sharp. The proof of the results uses the results obtained here in 
order to define and analyze the reduction from UG given the integrality gap of the corresponding 
convex optimization problem. We note that for most predicates the UG hardness of Max CSP(P) 
is unknown and therefore the results of [26j complement those of [I]. 
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2 Correlated Spaces and Noise 

In this section we define and study the notion of correlated spaces and noise operators in a general 
setting. 

2.1 Correlated Probability Spaces and Noise Operators 

We begin by defining noise operators and giving some basic examples. 

Definition 2.1 Let (fiW x O^ 2 ), P) be two correlated spaces. The Markov Operator associated with 
(0W,0( 2 )) is the operator mapping f G L p (0( 2 ),P) to Tf G L p (OW,P) by: 

(Tf)(x) = E[f(Y)\X = x], 

for x G O^ 1 ) and where (X,Y) G O^ 1 ** x O^ 2 ) is distributed according to P. 

Example 2.2 In order to define Bonami-Beckner operator T = T p on a space (O, p), consider the 
space (Oxil,!/) where u(x,y) = (1 — p)p{x)p{y) + p5{x = y)p(x), where 5(x = y) is the function on 
0x0 which takes the value 1 when x = y, and otherwise. In this case, the operator T satisfies: 

(Tf)(x)=B[f(Y)\X = x}, (17) 

where the conditional distribution ofY given X = x is p5 x + (1 — p)p, where 5 X is the delta measure 
on x. 

Remark 2.3 The construction above may be generalized as follows. Given any Markov chain on 
O that is reversible with respect to p, we may look at the measure v on x O defined by the Markov 
chain. In this case T is the Markov operator determined by the chain. The same construction 
applies under the weaker condition that T has p as its stationary distribution. 

It is straightforward to verify that: 

Proposition 2.4 Suppose that for each 1 < i < n, (O- 1 ^ x 0- 2 \^j) are correlated spaces and T{ 
is the Markov operator associated with O^ and 0^ 2 \ Then ([\?=i Y17=i ^[ 2 \ EKLi Mi) defines 
two correlated spaces and the Markov operator T associated with them is given by T = ®2=x^i- 

Example 2.5 For product spaces {^\a=i I1T=i A 4 *)? the Bonami-Beckner operator T = T p is 
defined by 

T = ®? =1 2j, (18) 

where T % is the Bonami-Beckner operator on (Oj x Oj,/ij). This Markov operator is the one most 
commonly discussed in previous work, see e.g. fl4\ EH [EE/- In a more recent work [?]/ the case of 
O, x Oj with Ti a reversible Markov operator with respect to a measure pi on Oj was studied. 

Example 2.6 In the context of the Majority is most Predictable Theorem \l.l\ the underlying space 
is O = {±1} x {0, ±1} where element (x,y) G corresponds to a voter with vote x and a sampled 
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vote y where either y = x if the vote is queried or y = otherwise. The probability measure p, is 
given by 

p(x,y) = \{5{x = 1) + S(x = -l))( P 5(y = x) + (1 - p)5(y = 0)). 
Note that the marginal distributions on tts = {0,±1} an d = {±1} are given by 

H = (1 - p)6 + + i/ = + <5i), 

and 

K-|±i) = ^±i, K-|o) = i(*i + *-i). 

Given independent copies pi of p and v± of v, the measure p = (g)" =1 /ij corresponds to the distri- 
bution of a sample of voters where each voter is sampled independently with probability p and the 
distribution of the voters is given by v = ®™ =l Vi. 

Example 2.7 The second non-reversible example is natural in the context of Condorcet voting. 
For simplicity, we first discuss the case of 3 possible outcomes. The general case is discussed later. 

Let t denote the uniform measure on the set permutations on the set [3] denoted S3. Note that 
each element a E S3 defines an element f £ {—1, 1}( 2 ) by letting f(i,j) = sgn(<r(z) — o~(j)). The 
measure so defined, defines 3 correlated probability spaces ({ztlj-^^p). 

Note that the projection of P to each coordinate is uniform and 

P(/(3, 1) = -1|/(1, 2) = /(2, 3) = 1) = 0, P(/(3, 1) = 1|/(1, 2) = /(2, 3) = -1) = 0, 

and 

P(/(3,l) = ±l|/(l,2)^/(2,3)) = 1/2. 
2.2 Properties of Correlated Spaces and Markov Operators 

Here we derive properties of correlated spaces and Markov operators that will be repeatedly used 
below. We start with the following which was already known to Renyi |27j . 

Lemma 2.8 Let (QM x i~l( 2 \P) be two correlated spaces. Let f be a f}( 2 ) measurable function 
with E[/] = 0, and E[f 2 ] = 1. Then among all g that are f^ 1 ) measurable satisfying E[g 2 ] = 1, a 
maximizer of \E[fg]\ is given by 

9= , Tf =, (19) 
where T is the Markov operator associated with (fiW, fi( 2 )). Moreover, 

\E[gf}\ = | E[/T/]I = v/E[(T/) 2 ]. (20) 

1 L Jl ^E[(r/) 2 ] v LV ' J y 1 

Proof: To prove (|19p let h be an measurable function with \\h\\2 = 1. Write h = ag + (3h' 
where a 2 + f3 2 = 1 and ||/i'||2 = 1 is orthogonal to g. From the properties of conditional expectation 
it follows that E[fh'] = 0. Therefore we may choose an optimizer satisfying a G ±1. Equation (|20j) 
follows since Tf is a conditional expectation. The same reasoning shows that E[/g] = for every 
measurable function g if Tf is identically 0. 

□ 

The following lemma is useful in bounding p(Q^' , n,^; P) from Definition 11.91 in generic sit- 
uations. Roughly speaking, it shows that connectivity of the support of P on correlated spaces 
fiW x implies that p < 1. 



12 



Lemma 2.9 Let (f^ 1 ) x Q( 2 \~P) be two correlated spaces such that the probability of the smallest 
atom in x Cl^ is at least a > 0. Define a bi-partite graph G = (f^ 1 ), , E) where (a,b) & 
fl^ x fi( 2 ) satisfies (a,b) £ E ifP(a,b) > 0. Then if G is connected then 

p(QW,n^;P) <l-a 2 /2. 

Proof: For the proof it would be useful to consider G' = (OW U ,E),& weighted directed graph 
where the weight W(a, b) of the directed edge from a to b is P[6|a] and the weight of the directed 
edge from 6 to a is W(b, a) = P[a\b]. Note that the minimal non-zero weight must be at least a 
and that W(a — ► b) > iff W(b > a) > 0. This later fact implies that G' is strongly connected. 
Note furthermore that G' is bi-partite. 

Let A be the transition probability matrix defined by the weighted graph G' . Since G' is 
connected and W(a,b) > a for all a and b such that W(a — > b) > 0, it follows by Cheeger's 
inequality that the spectral gap of A is at least a 2 /2. Since G' is connected and bi-partite, the 
multiplicities of the eigenvalues ±1 are both 1. Corresponding eigenfunctions are the constant 1 
functions and the function taking the value 1 on f^ 1 ) and the value —1 on fj( 2 ). 

In order to bound p it suffices by Lemma [2^81 to bound H-A/H2 f° r a function / that is supported 
on fj( 2 ) and satisfies E[/] = 0. Note that such a function is orthogonal to the eigen- vectors of A 
corresponding to the eigenvalues —1 and 1. It therefore follows that H-A/H2 < (1 — q 2 /2)||/||2 as 
needed. □ 

One nice property of Markov operators that will be used below is that they respect the Efron- 
Stein decomposition. Given a vector x in an n dimensional product space and S C [n] we write xs 
for the vector (xj : i E S). Given probability spaces Oi, . . . , Q n , we use the convention of writing 
Xi for a random variable that is distributed according to the measure of Qi and xt for an element 
of fij. We will also write Xs for (Xj : i £ S). 

Definition 2.10 Let (fii, /ii), . . . , (f2 n , fi n ) be discrete probability spaces = lliLi(^i)^i)- 

The Efron-Stein decomposition of f : -^M is given by 

m = Yl fsM, (21) 

SC[n] 

where the functions fs satisfy: 

• fs depends only on xs- 

• For all S % S' and all xs' it holds that: 

E[fs\X s > = x S '} = 0. 

It is well known that the Efron-Stein decomposition exists and that it is unique [9]. We quickly 
recall the proof of existence. The function fs is given by: 

fs(x) = ]T (-l)\ s \ s '\E[f(X)\X s/ = x S '] 
s'cs 

which implies 

Y J fs(x) = ^E[f\Xs' = x] (-l) lSXS ' l =W\X [n] =x [n] ] = f(x). 

S S' S:S'CS 
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Moreover, for S % S' we have E[/s|X,s/ = xs>] = E[fs\Xs'ns = xs'ns] and for S' that is a strict 
subset of S we have: 

E[f s \X s , = x s >] = Y, (-l) ]SXS " l W(X)\Xs»nS> = xs-ns'} 
s"cs 

= Yl nf(x)\x s » = x s „] c- 1 ) 1 ^ 1 = °- 

S"cS> s"cscs"u(s\s') 

We now prove that the Efron-Stein decomposition "commutes" with Markov operators. 

Proposition 2.11 Let (O^ x Q,[ 2 \Pi) be correlated spaces and let Tj the Markov operator asso- 
ciated with and for 1 < i < n. Let 

n n n 

1=1 1=1 1=1 

Suppose f £ L 2 (f}( 2 )) /ias Efron-Stein decomposition 1121]) . Then the Efron-Stein decomposition of 
Tf satisfies: 

(Tf) s = T(f s ). 

Proof: Clearly T(fs) is a function of xs only. Moreover, for all S <^ S' and x^, it holds that 

E[(T(f s ))(X)\X s , = x%) = B[f s (Y)\X s , = x° s ,] = E\E\fs(Y)\Y s ,]P\Y s >\X s , = x%]] = 0, 

where the second equality follows from the fact that Y is independent of X$' condition on Ygi. □ 
We next derive a useful bound showing that in the setting above if p(Fl\ x Sl^; P) < 1 for all 
i then Tf depends on the "low degree expansion" of /. 

Proposition 2.12 Assume the setting of Proposition \2.1i\ and that further for all i it holds that 
p(Qf\ Pi) < Pi- Then for all f it holds that 



\\T(fs)h< [ilpi] \\fsh. 



\ies 



Proof: Without loss of generality if suffices to prove the statement of Proposition 12. 121 for S = [n]. 
Thus our goal is to snow that 



p7n a < n 



\i=l 



(from now on S will denote a set different than [n]). For each < r < n, let T^ r ) denote the 
following operator. T^ r ' maps a function g of z = (zi, . . . , z n ) = (xi, . . . , sc r _i, y r , . . . , y n ) to a 
function T^~)g of w = (w±, . . . , w n ) = (x\, . . . , x r , y r +i, ■ ■ ■ , y n ) defined as follows: 

T^g(w)=E[g(Z)\W = w}. 

(Here Z = (Xi, . . . , X r -\, Y r , . . . , Y n ) and similarly W). 

Let g be a function such that for any subset S C [n] and all zs, 

E[g(Z)\Z s = z s ] = 0. 
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We claim that 

\\T^gh<p r \\g\\ 2 (22) 

and that for all subsets S C [n] it holds that 

E[(T^g)(W)\W s = w s } = 0. (23) 

Note that ([22]) and ([23} together imply the desired bound as T = ■ ■ ■ T W . 
For (|22p note that if S = [n] \ {r} and / = T^g then by lemma [ 



E[f 2 (W)\Z s = z s ] = E[g(Z)f(W)\Z s = z s ] < Pr y/B[P(W)\Z s = z s W(Z)\Z s = z s ]. 

So 

E[f 2 (W)\Z s = z s ] < p 2 r E[g 2 (W)\Z s = z s ], 
which gives ||/||2 < Prllslh by integration. 



For (|23p we note that if S C [n] then 

E[f(W)\W s = w s ] = E[g(Z)\W s = w s ] = E[E[g(Z)\Z s }P[Z s \Ws = w s ]} = 0. 
This concludes the proof. □ 

Proposition 2.13 Assume the setting of Proposition ^. 1 1[ Then 



p(U n, (1) , n ^ (2) ; II p = nu« P (n« n?>). 

i=l i=l i=l 

Proof: Let / G ^ 2 (nr=i ^ 2> ) with E [/l = and Var L/1 = L Expand / according to its Efron- 
Stein decomposition 

f = }Zfs, 

(/ = since E[/] = 0). Then by propositions EH] and ETJ 

E[(Tf) 2 ] = E[(T(^/ s )) 2 ]=E[(^T/ 5 ) 2 ]=^E[(T/ s ) 2 ]<^n^H^ll2 

5 S S S^QieS 

< maxp 2 V \\fs\\l = maxp 2 . 
% *■ — * 1 

The other inequality is trivial. □ 

3 Background: Influences and Hypercontractivity 

In this section we recall and generalize some definitions and results from [22]. In particular, the 
generalizations allow us the study non-reversible Markov operators and correlated ensembles. For 
the reader who is familiar with |22| it suffices to look at subsections 13.31 and I3"l 



3.1 Influences and noise stability in product spaces 

Let (Qi, fj,±), . . . , (Q n ,fj, n ) be probability spaces and let (O, p) denote the product probability space. 
Let 

f = (f 1 ,...J k ):Q 1 x--- xQ n ^R k . 

Definition 3.1 The influence of the ith coordinate on / is 

Infi(/) = ^2 Fi\yar[fj\xi,...,Xi-i,Xi + i,...,x n ]], 
i<j<fc 
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3.2 Multi-linear Polynomials 



In this sub-section we recall and slightly generalize the setup and notation used in [22J. Recall that 
we are interested in functions on product of finite probability spaces, / : Q± X • • ■ X £l n — > R. For 
each i, the space of all functions f2j — > R can be expressed as the span of a finite set of orthonormal 
random variables, XjQ = lj -Xi.i, -?Q,2> ^i,3> ••• then / can be written as a multilinear polynomial 
in the -Xjj's. In fact, it will be convenient for us to mostly disregard the fij's and work directly with 
sets of orthonormal random variables; in this case, we can even drop the restriction of finiteness. 
We thus begin with the following definition: 

Definition 3.2 [22] We call a collection of finitely many orthonormal real random variables, one 
of which is the constant 1, an orthonormal ensemble. We will write a typical sequence of n 
orthonormal ensembles as X = (X\, . . . , X n ), where Xi = {X^o = lj-Xi.i, • • • ,Xi 7frH }. We call 
a sequence of orthonormal ensembles X independent if the ensembles are independent families of 
random variables. 

We will henceforth be concerned only with independent sequences of orthonormal ensembles, 
and we will call these sequences of ensembles, for brevity. Similarly, when writing an ensemble we 
will always mean an orthogonal ensemble. 

Remark 3.3 ]22§ Given a sequence of independent random variables X\, . . . ,X n with E[Xj] = 
and ¥j[Xf] = 1, we can view them as a sequence of ensembles X by renaming Xi = Xn and setting 
Xm = 1 as required. 

Definition 3.4 [22] We denote by Q the Gaussian sequence of ensembles, in which Qi = {Ct^q = 
1, Gn, Gi2, ■ ■ ■ Gi mi } and all Gij 's with j > 1 are independent standard Gaussians. 

The Gaussian ensembles discussed in this paper will often have m; chosen to match the rrii of a 
given ensemble. 

As mentioned, we will be interested in multilinear polynomials over sequences of ensembles. 
By this we mean sums of products of the random variables, where each product is obtained by 
multiplying one random variable from each ensemble. 

Definition 3.5 [22] A multi-index a is a sequence (<ti, . . . ,o~ n ) in N n . The degree of cr, denoted 
\cr\, is \{i S [n] : <7j > 0}|. Given a doubly-indexed set of indeterminates {^ij}ie[ n ]jeN) we write 
for the monomial Yl^ =i Xi i(Ti . We now define a multilinear polynomial over such a set of inde- 
terminates to be any expression 



where the c„ 's are real constants, all but finitely many of which are zero. The degree of Q(x) is 
max{|<r| : c CT 7^ 0} ; at most n. We also use the notation 



and, analogously, Q d (x) and Q >d (x). 

Naturally, we will consider applying multilinear polynomials Q to sequences of ensembles X; 
the distribution of these random variables Q(X) is the subject of our invariance principle. Since 
Q(X) can be thought of as a function on a product space Sli x • • • x £l n as described at the beginning 




(24) 



\tr\<d 
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of this section, there is a consistent way to define the notions of influences, T p , and noise stability 
from Section [3.11 For example, the "influence of the ith ensemble on Q" is 

Inh(Q(X)) = E\Vax[Q(X) \X U ..., Xi-u X n ]]. 

Using independence and orthonormality, it is easy to show the following formulas: 

Proposition 3.6 Let X be a sequence of ensembles and Q a multilinear polynomial as in {2J$ . 
Then 

B[Q(X)} = c ; E[Q(A-) 2 ] = E <£; Var[Q(X)\ = E 4; 

<T |<r|>0 

and 

inU(Q(x))= E 4; 

<T:tT i >0 

For p E [0, 1] we define the operator T p as acting formally on multilinear polynomials Q{x) as 
in ([241) by 

(T p Q)(x) = Y,P W ^ (T x (T . (25) 

er 

We note the definition in (|17p and (|18p are consistent with the definition in (I25p in the sense that 
for any ensemble X the two definitions results in the same function (T p Q)(X). 

We finally recall the notion of "low-degree influences", a notion that has proven crucial in the 
analysis of PCPs in hardness of approximation in computer science (see, e.g., [18] ). 

Definition 3.7 [22] The d-low-degree influence of the ith ensemble on Q(X) is 

Inff (Q(X)) = Inff (0) = £ 4- 

<r:|<T|<d,(Ti>0 

Note that this gives a way to define low-degree influences lnff d (f) for functions / : Oi X • • ■ £l n — > M 
on finite product spaces. 

There isn't an especially natural interpretation of Inf^ d (/). However, the notion is important for 
PCPs due to the fact that a function with variance 1 cannot have too many coordinates with 
substantial low-degree influence; this is reflected in the following easy proposition: 

Proposition 3.8 \22] Suppose Q is multilinear polynomial as in \2J$ . Then 

^lnfP(Q) <d-Var[Q]. 

i 

The proof follows since: 

E Inf ^) = E E 4= E M4<<* E 4 = dVar[Q]. 

« ' o-:|£r|<d,CTi>0 <r:0<|cr|<d <t:0<|<t| 



17 



3.3 Vector valued multi-linear polynomials 

For the invariance principle discussed here we will need to consider vector-valued multi-linear 
polynomials. 

Definition 3.9 A k- dimensional multilinear polynomial over a set of indeterminates is given by 

Q = (Qi,...,Qk) (26) 

where each Qj is a multi-linear polynomial as in \2J$ . The degree of Q is the maximal degree of 
the Qj 's. 

Definition 3.10 We adopt the standard notation and write \\Q\\ q for E 1 / 9 ^*^ |Qi| 9 ]; we write 
Var[Q] for E[||Q - E[Q]|||] and 

k 

Infi(Q(Af)) = ^Inf i (Q i (Af)) 

i=i 

Using these definitions, it is easy to see that 

Proposition 3.11 Let X be a sequence of ensembles and Q = (Qi, ■ ■ ■ ,Qk) a & dimensional 
multilinear polynomial where Qj is defined as in \2J$ with c£- as its coefficients. Then: 

E[Q(AT)] = (4...,cg); \\Q(X)f 2 = JV^; Var[Q(AT)] = £ <£. 

3,<r j',|crJ |>0 

Finally, we recall the standard multi-index notation associated with A;-dimensional multi-linear 
polynomials. A multi-index i of dimension k is a vector (ix, . . . ,ik), where each ij is an integer. 
We write |i| for i\ + ■ ■ ■ + ik and i! for iili^l ■ ■ ■ Given a function tjj of k variables, we will write 
ip® for the partial derivative of / taken i\ times with respect to the first variable, i<i with respect 
to the second etc. (we will only consider functions if) that are smooth enough that the order of 
derivatives does not matter). We will also write Q 1 for the product Q^ • • • Q l £ . 

3.4 Hypercontractivity 

As in [22] the invariance principle requires that the ensembles involved are hypercontractive. Recall 
that Y is (2, q, ?7)-hypercontractive with some r\ G (0, 1) if and only if E\Y] = and E[|y| 9 ] < oo. 
Also, if Y is (2, q, r/)-hypercontractive then rj < (q — l) -1 / 2 . 

Definition 3.12 Let X be a sequence of ensembles. For 1 < p < q < oo and < rj < 1 we say 
that X is (p, q, r/)-hypercontractive if 

\\(.T v Q)(X)\\ q <\\Q(X)\\ p 

for every multilinear polynomial Q over X. 

Since is a contractive semi-group, we have 

Remark 3.13 If X is (p,q,rj) -hypercontractive then it is (p,q,rj') -hypercontractive for any < 

if < T}- 
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There is a related notion of hypercontractivity for sets of random variables which considers all 
polynomials in the variables, not just multilinear polynomials; see, e.g., Janson |13] . We summarize 
some of the basic properties below, see |22] for details. 

Proposition 3.14 [22] Suppose X is a sequence ofn\ ensembles and y is an independent sequence 
of ri2 ensembles. Assume both are (p,q,rj) -hypercontractive. Then the sequence of ensembles XL) 
y = (X\, . . . , X ni ,y±, . . . , y n2 ) i> s a ^ so (??> <7> rj) -hypercontractive. 

Proposition 3.15 [22] Let X be a (2, q,n) -hypercontractive sequence of ensembles and Q a mul- 
tilinear polynomial over X of degree d. Then 

\\Q(x)\U < v~ d WQWh- 

We end this section by recording the some hypercontractive estimates to be used later. The 
result for ±1 Rademacher variables is well known and due originally to Bonami [5 J and indepen- 
dently Beckner [3J; the same result for Gaussian and uniform random variables is also well known 
and in fact follows easily from the Rademacher case. The optimal hypercontractivity constants for 
general finite spaces was recently determined by Wolff [33] (see also [25]): 

Theorem 3.16 Let X denote either a uniformly random ±1 bit, a standard one-dimensional Gaus- 
sian, or a random variable uniform on y/3]. Then X is (2, q, (q — l)" 1 ' /2 ) -hypercontractive. 

Theorem 3.17 133\] (Wolff) Let X be any mean-zero random variable on a finite probability 
space in which the minimum nonzero probability of any atom is a < 1/2. Then X is (2,q,rj q (a))- 
hypercontractive, where 

-1/2 

/ !■'.■"/_ ! " ! '/ \ 

rj q (a) 



with A 




Note the following special case: 
Proposition 3.18 JM 



i«V6 < m{a) < 2 -i/ 2) 



and also 

for all a G [0,1/2]. 



3.5 Vector Hyper-Contraction 

For our purposes we will also need to obtain hypercontraction results in cases where Q is a k- 
dimensional multi-linear polynomial. We will need to consider vector-valued multi-linear polyno- 
mials. 

Proposition 3.19 Let X be a (2, q,n) -hypercontractive sequence of ensembles and Q a multilinear 
polynomial over X of degree d and dimension k. Assume q is integer and let i be a multi-index 
with |i| = q. Then 

k 

E[iQwi]<^n E ^] ij/2 ' 
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Proof: 

k k 

n\Q\x)\\ < \\n\Q 3 \ q v jq < v~ dq HnQ 2 ^ /2 , 

3=1 3=1 
where the first inequality is Holder and the second follows by hypercontractivity. □ 

4 Multi-dimensional Invariance principle 

In this section we generalize the invariance principle from [22\ to the multi-dimensional setting. We 
omit some easy steps that are either identical or easy adaptation of the proofs of [22J. 

4.1 Hypotheses for invariance theorems 

Below we will prove a generalization of the invariance principle [22]. The invariance principle proven 
there concerns a multilinear polynomial Q over two hypercontractive sequences of ensembles, X 
and y ; furthermore, X and 3^ are assumed to satisfy a "matching moments" condition, described 
below. 

It is possible to generalize the invariance principle to vector valued multi-linear polynomials 
under each of the hypercontractivity assumptions H\,H1,H3 and HA of [22J. However, since 
the proof of all generalizations is essentially the same and since for the applications studied here 
it suffices to consider the hypothesis H3, this is the only hypothesis that will be discussed in the 
paper. It is defined as follows: 

H3 Let A* be a sequence of n ensembles in which the random variables in each ensemble Xi form 
a basis for the real- valued functions on some finite probability space Further assume that 
the least nonzero probability of any atom in any tti is a < 1/2, and let rj = ^a 1//6 - Let y be 
any (2, 3, ?7)-hypercontractive sequence of ensembles such that all the random variables in 3^ 
are independent. Finally, let Q be a k dimensional multilinear polynomial as in (|26p . 



4.2 Functional Setting 

The essence of our invariance principle is that if Q is of bounded degree and has low influences then 
the random variables Q(X) and Q(y) are close in distribution. The simplest way to formulate this 
conclusion is to say that if ^ : M fc — > K is a sufficiently nice "test function" then ty(Q(X)) and 
^(Q(y)) are close in expectation. 

Theorem 4.1 Assume hypothesis H3. Further assume that Q is a k dimensional multi-linear 
polynomial, that Va.r[Q] < I, deg(Q) < d, and Infj(Q) < r for all i. Let ^ : R k — > R be a C 3 
function with \^f^\ < B uniformly for every vector i with |i| = 3. Then 



E[*(Q(AT))] -E[*(Q(y))] <e :=2dBk 3 {8a 



-l/2\d 1/2 



Proof: Note that by Proposition 13.181 the random variables satisfy (2, 3, rf) hypercontractivity 
with r] = ^a 1 / 6 . 

We begin by defining intermediate sequences between X and y. For i = 0,1, ... ,n, let 
denote the sequence of n ensembles (3^i, . . . , y if X i+1 , ...,X n ) and let Q« = Q{Z^). Our goal 
will be to show 



E[*(Q 



E[¥(Q 



< 2Bk 3 rj- 3d lnU(Q) 3 / 2 



(27) 
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for each j 6 [n]. Summing this over i will complete the proof since Z^ = X , Z^ = y, and 
f>W) 3 / 2 < r 1 ' 2 -jy&i{Q) = r 1 / 2 -Elnf^(Q) < 



8=1 



8=1 



8=1 



where we used Proposition 13.81 and J2j^ ar iQj] — 1- 

Let us fix a particular i G [n] and proceed to prove (|27p . Given a multi-index er, write <r \ i for 
the same multi- index except with a = 0. Now write 



i? 

s 

S) 



(0 

«r\i> 



cr:cri>0 



<x\i ' 



<T:ai >0 



Note that Q and the variables are independent of the variables in X{ and and that 

Q^" 1 ) = Q + i? and Q« = Q + 5. 

To bound the left side of (|27]) — i.e., |E[*(Q + R) - + 5)]| — we use Taylor's theorem: 
for all x, y G R, 



f(x + y) - ^ 



|fc|<3 



fc! 



< 



£ 

|*|=3 



I? 

fc! 



In particular, 



and similarly, 



ei«((} + «)]-j:e[*?^*]|si;^bk 



|fc|<3 



|fc|=3 



E 



[*(g + fl )]-E»P^]|^E^ 



(28) 



(29) 



|fc|<3 |fc|=3 

We will see below that R and S have finite 3'rd moments. Moreover, for < k < r with \r\ = 3 it 
holds that \^^{Q) R k \ < \k\ B Q r ~ k R k \ (and similarly for S). Thus all moments above are finite. 
We now claim that for all < Ifcl < 3 it holds that 



E[*( fe )(Q) R k ] = E[*W(Q) S k ]. 



(30) 



This follows from the fact that the expressions in the expected values when viewed as multi-linear 
polynomials in the variables in X{ and respectively are of degree < 2 and each monomial term 
in Xi has the same coefficient as the corresponding monomial in 3?j. 
Prom (J28|), (J2SD and (30]) it follows that 



B 



|E[*(Q + H) - *(Q + S)}\ < £ - (E[|i?n + ED5H). 



(31) 



r=3 



We now use hypercontractivity. By Proposition 13.141 each Z^ is (2, r, 7/)-hypercontractive. Thus 
by Proposition 13.191 



E[|i?n < 7]~ 3d ]J ~E[Rj] r i/ 2 , E[\S\ r ] < rj- 3d ]J E[5 



?Fj72 



(32) 
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However, 

E[S]]=E[R]]= Y, cl^lnUiQ^KhMQ)- (33) 

<tJ:ct^>0 



Combining d3T|), O and (j33|) it follows that 

|E[*(Q + fl) - *(Q + < 2^fc 3 rf 3d • Inf,(Q) 3 / 2 
confirming (|27|) and completing the proof. □ 

4.3 Invariance principle — other functionals, and smoothed version 

The basic invariance principle shows that E[\P(Q(A?))] and E[^(Q(y))] are close if ^ is a C 3 
functional with bounded 3rd derivative. To show that the distributions of Q{X) and Q(y) are close 
in other senses we need the invariance principle for less smooth functionals. This we can obtain 
using straightforward approximation arguments, see for example |22j . For applications involving 
bounded functions, it will be important to bound the following functionals. We let /[ 0j i] : R — ► R 
be defined by 

f[o,i]( x ) = max(min(x, 1), 0) = min(max(x, 0), 1), 
and £ : R fc — » R be defined by 

k 

C(x)=^(x 4 -/ [0jl] (x i )) 2 . (34) 
i=l 

Similarly, we define 

k 

X(x) = Y[f[ ,i](xi). (35) 

i=l 

Repeating the proofs of [22J one obtains: 

Theorem 4.2 Assume hypothesis H3. Further assume that Q = (Q\, . . . , Qk) is a k- dimensional 
multi-linear polynomial with Var[Q] < 1 and Inf^ log (Q) < r for all i, where 

K = log(l/a). 

Suppose further that for all d it holds that Var[Q >d ] < (1 — r y) 2d where < 7 < 1. Write R = Q(X) 
andS = Q(y). Then 



E[C(fl)]-E[C(5)] 



where f2(.) hides a constant depending only on k. Similarly, 



E[x(R)]-B[ X (S)] 



< T 



n( 7 /K) 



Proof: The proof for £ uses the fact that the function £ admits approximations £a such that 

IIC-CaIIoo = 0(a 2 ), 

||CIIIoo = 0(A- 1 ), 

for all r with Irl = 3. 
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This implies that for all k dimensional degree d polynomials we have: 

E[C(R)] -E[C(5)] < 0(a- d/3 T- 1/3 ). 



(36) 



See |22j for details. In order to obtain the result for polynomials with decaying tails we use the 
fact that 

k 

|COi + b x , . . . ,a k + b k ) - COi, ■ ■ • ,a k )\ < ^(|aA| + bf). 

i=l 

This implies that evaluating £ at polynomials truncated at level d results in an error of at most 
0(exp(— dj)) which together with the bound (|36p implies the desired bound for £. 
The proof for \ is similar as the function Q admits approximations xx such that 

llx - XaIIoo = O(A), 



IxaIIoo 



o(\- 



for all r with Irl = 3. 



□ 



Of particular interest to us is the following corollary. 



Corollary 4.3 Assume hypothesis H3. For each 1 < i < n and 1 < j < k, let X\ denote 
an ensemble all of whose elements are linear combinations of elements in Xi. Similarly for each 
1 < i < n and 1 < j < k, let yj denote an ensemble all of whose elements are linear combinations 
of elements in 

Assume further that for all i and j it holds that \y{\ = and that there is a one to one 

correspondence between Y £ y? given by: 

r=0 

and X G X\ given by: 

\Xi\-\ 

X = a(i,j,k)X k , 

r=0 

where ^ = {Y , . . . ,Y\ y .\_ x } and X t = {X , . . . ,X\ y .\_i}. 

Let Q = (Qi, ■ ■ ■ , Qk) o- be multi-linear polynomial with Var[Q] < 1 and Inff l ° s ^ (Q) < r 
for all i where 

K = log(l/a). 

Suppose further that for all d it holds that Y&r[Qj d ] < (1 — r y) 2d where < 7 < 1. Write 
R = (Q^X 1 ), Q k {X k )) and S = (QiCV 1 ), . . . , Q k (y k ))- Then 

< r n(7/K) 



E[C(i?)]-E[C(5)] 



and 



< T fi(7/* ) 



B[ X (R)] -E[ x (5)] 
where the O(-) hides a constant depending only on k. 

Proof: The proof follows immediately from the previous theorem noting that Infj and Var[Q >d ] 
are basis independent. □ 
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5 Noise in Gaussian Space 



In this section we derive the Gaussian bounds correlation bounds needed for our applications. The 
first bound derived in subsection EH] is an easy extension of [UJ. The second one gives a quantitative 
estimate on iterations of the first bound that will be needed for some of the applications. 

5.1 Noise stability in Gaussian space 

We begin by recalling some definitions and results relevant for "Gaussian noise stability" . Through- 
out this section we consider W 1 to have the standard n-dimensional Gaussian distribution, and our 
probabilities and expectations are over this distribution. Recall Definition 11.121 We denote by U p 
the Ornstein-Uhlenbeck operator acting on L 2 (R n ) by 

(u p f)(x) = B[f( P x + Vi-P 2 y)], 

where y is a standard n-dimensional Gaussian. 

It is immediate to see that if /, g G L 2 (W l ) then 

B[fU p g] = B[gU p f] = E[f(X u . . . , X n ) 5 (Yi, . . . , Y n )] 

where (Xj,Yj) are independent two dimensional Gaussian vectors with covariance matrix fpil 
The results of Borell [6] imply the following (see [22] for more details): 

Theorem 5.1 Let f,g : M. n — ► [0, 1] be two measurable functions on Gaussian space with E[/] = p 
and E[<7] = v . Then for all < p < 1 we have 

T p (p,v)<E[fU p g]<T p (p,v). 

We will need the following corollary. 

Corollary 5.2 Let X\, . . . , X n , Yi, . . . , Y m be jointly Gaussian such that X\, . . . , X n are indepen- 
dent, Y\ , . . . , Y m are independent and 

sup \Cov{y]cxiXi,y^PiYi]\< p. 

||a|| 2 =||/J||a=l i i 

Let /i : M. n -> [0, 1], f 2 : M m -> [0, 1] with E[/j] = pj. Then 

T p (p 1 ,p 2 )<E[f 1 f 2 ]<T p (p 1 ,p 2 ). 

Proof: 

Note that if m > n then we may define X n+ i, . . . , X m to be Gaussian and independent of all 
the other variables. This implies that without loss of generality we may assume that m = n. 
We claim that without loss of generality the covariance matrix between X and Y is given by 

Cav[Xi,Yj]=0, 

for i ^ j and |Cov[Xj,l^]| < p for all i. To see this, take ||a||2 = WPW2 = 1 which maximizes 
C° V E a iXi, PiYi] and define X\ = Yl ®iXi,Yi = PiYi and X 2 -, ■ ■ ■ X n to be orthonormal 
basis of the projection of the span of {X 2 , . . . , X n } to the orthogonal complement of the space 
spanned by X\ and similarly Y 2 , . . . ,Y n . It is easy to check that CovpTi, Yi] = Cov[Yi, Xj\ = for 
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i > 1. Repeating this process for X2, ■ ■ ■ , X n and Y2, . . . , Y n etc. we obtain the desired covariance 
matrix. 

Write Cov[X, Y] = pJ, where J is diagonal with all entries in [—1,1]. Then clearly 
E[/!(Xi, . . .,X n )h(Y h ... ,Y n )} = E[f 1 U p (U J f 2 )}, 
where U p is the Ornstein-Uhlenbeck operator and Uj is the operator defined by 



(Ujf)(xi, ...,X n ) = E[/(jiXi + yjl - J\ X V\ , • • • ,j n X„ + \J 1 - J% >n y n )], 

where y is distributed according to the Gaussian measure. Since Uj is a Markov operator, we have 
E[[/j/2] = E[/2] and < U 7/2 < 1. Now applying Borell's result we obtain the desired conclusion. 
□ 

We note that in general there is no closed form for T p (p, v); however, for balanced functions we 
have Sheppard's formula [S]: r p (l/2, 1/2) =\ + 2~ arcsin p. Finally we record a fact to be used 
later. 

Proposition 5.3 Let ((Xij, be jointly Gaussian, each distributed N (0,1). Suppose fur- 

ther that for each i: 

sup \Cov^y\ajXij y\@jYij]\ <p, 

n«ii2=ii/3ii 2 =i y Y 

and i/iat t/ie n collections ((Xij, Yij)j)f =1 are independent. Then we have: 

sup I Cov [y~] a it jX it j S~] PijYi j] I < p, 
|[a|[ 2 =||/3|[ 2 =l Xi id 

Proof: Using the fact that a linear combination of Gaussians is a Gaussian it suffices to show that 
if (X,i,Yi) are independent Gaussian vectors, each satisfying |Cov(Xj, 1^)1 < p,Xi,Y; L ~ iV(0, 1) 
and 1 1 ck [| 2 = ||/3||2 = 1 then 

\Cov[J2 ViXi^PiYiW <p. 

i i 

This follows immediately from Cauchy-Schwarz: 

\Cav[Y t CHXiY^^ Y i\\ = \^2°H0iCav[X i ,Y i \\ < p^Ml < P- 

iii i 

□ 

5.2 Asymptotics of T() 

In some of the applications below we will need to estimate V pi ^^ Pkl (p\, . . . ,pu)- in particular, we 
will need the following estimate 

Lemma 5.4 Let < p < 1 and < p < 1. Define 

Bk{p,p) = r piv .. iPfc _ 1 (/ii, . . . ,p k ), 
where pi = . . . = Pk-i = p and pi = . . . = pk = p. Then as k — > 00 we have 

B k {p,p)<k^lP 2+ °^. (37) 
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Proof: Clearly, we have 

B i+1 (p,p J )=T p (p,B i ). (38) 

The proof proceeds by deriving bounds on recursion (I38p . This is a straightforward (but not very 
elegant) calculation with Gaussians. Writing B{ for Bi(p,fi), the main two steps in verifying (|37p 
are to show that 

• The sequence Bj converges to as j — > oo. This follows from the fact that the functions 
B — > T p (p,B) are easily seen to be strictly decreasing and have no fixed points other than 
B = and £? = 1 when < p < 1. 

• Using Gaussian estimates sketched below, we see that for Bj—i sufficiently small it holds that 

1 1 + 7^+0(1) 

'h i '-j - r 2 '[, . ' • (39) 



This corresponds to Bj of the form 



Bj < Cj 



More formally, it is easy to see that if -Bj-i is sufficiently small and satisfies 

Bj-x < C(j - l)- a 

and 

BjKBj^l-^L*) 

for a > then 

Bj < cr a . 

This follows using the fact that for small values of 5 the maximum of the function x(l — x l l a /2) 
in the interval [0, 5] is obtained at 5 and therefore: 



Bj < B H (i--B 3 r 1 )<c( J -ir i--(c(j-in 



c(j - iy a (i - \c l / a {j - 1)- 1 ) . 



In order that 

Bj < Cj- a , 

we need that 

1 

> 



3 



i - icv°(j - i)-i - \j - 1 

which holds for large enough value of C. 

In order to obtain ()39[) for small values of Bj—%, one uses the lemma stated below together with 
the approximation 

*(-t) = exp(-(l + o t (l))|). 
which implies that for every fixed e > 0: 

V 1 - p 2 

□ 
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Lemma 5.5 Let (X, Y) be a bi-variate normal with Var[X] = Var[Y] = 1 and CovpT, Y] = p. 

Then for all e > and t > it holds that 



*(-t)-r p (l/2, $(-*)) = P[X < -t] -P[X < -t,Y < 0] = P[X < -t,Y > 0] 

> $(_ t )(i_exp(-te-e 2 /2))$(-4ii4) 



y/l-p 

Proof: The equalities follow by the definitions. For the inequality, we write 

P[X<-t,Y>0] > P[X < -t]P[Y > 0,X > -t-e\X < -t] 

= P[X < -t]P[X >-t- e\X < -t]P[Y >0\-t>X>-t-e\. 



The bound in the lemma follows by bounding each of the three terms starting with P[X < —t] 
t). Then note that 

T>\Y < -t - e\ 

< exp(-te-e 2 /2), 



P[X < -t] 
and therefore 

P[X >-t- e\X < -t] > (1 - exp(-te - e 2 /2)). 
Finally, writing Z for a N(0, 1) variable that is independent of X we obtain 



P[Y > 0| -t > X > -t-e] > P[Y > 0\X = -t-e] = P[p(-t - e) + - p 2 Z > 0] 



as needed. □ 

6 Gaussian Bounds on Non-reversible Noise forms 

In this section we prove the main results of the paper: Theorem 11.141 and its relaxations Proposi- 
tions [TTT5] and [TTT6J As in previous work [2111221 [7]. the proof idea is to use an invariance principle, 
in this case Theorem 14.21 together with the Gaussian bounds of Section [5l 

Since the invariance principle requires working either with low degree polynomials, or polyno- 
mials that have exponentially decaying weight, an important step of the proof is the reduction to 
this case. This reduction is proved in subsection 16.11 It is based on the fact that p < 1 and on the 
properties of correlated spaces and Efron-Stein decompositions derived in Section [2l 

The reduction, Theorem 14.21 and truncation arguments allow to prove Theorem 11.141 for k = 2 
in subsection 16.21 and for k > 2 in subsection 16.31 

The relaxed conditions on the influences for k = 2 and for r-wise independent distributions are 
derived in subsection 16.41 using a "two-threshold" technique. A related technique has been used 
before in [HIT]. However, the variant presented here is more elegant, gives more explicit dependency 
on the influences and allows to exploit s-wise independence. 

Finally, using a recursive argument we derive in subsection 16 . 51 Proposition 11.151 We don't know 
of any previous application of this idea in the context of the study of influences. 
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6.1 Noise forms are determined by low degree expansion 

In order to use the invariance principle, it is crucial to apply it to multi-linear polynomials that 
are either of low degree or well approximated by their low degree part. Here we show that noise 
stability quantities do not change by much if one replaces a function by a slight smoothing of it. 
For the following statement recall Definition 11.91 for the definition of p and (|17p , (|18p and (|25p for 
the definition of the Bonami-Beckner operator T\—~. 

Lemma 6.1 Let fii, . . . , f2 n , Ai, . . . , A n be a collection of finite probability spaces. Let X, y be two 
ensembles such the collections of random variables such that (Afj, 3^) are independent and X{ is a 
basis for the functions in fij, 3^ is a basis for the functions in Aj. Suppose further that for all i it 
holds that p(Qi,Ai) < p. 

Let P and Q be two multi-linear polynomials. Let e > and 7 be chosen sufficiently close to 
so that 

ry < I _ (I _ e y°gp/(l0ge+l0gp)_ 

Then: 

\E[P(X)Q(y)]-B[T 1 ^(X)T 1 ^Q(y)}\ < 2eVar[P]Var[Q]. 

In particular, there exists an absolute constant C such that it suffices to take 

7 l0g(l/€) 

Proof: Without loss of generality if suffices to assume that Var[P] = Var[Q] = 1 and show that 

\B[P(X)Q(y)] - E[P(Af)(T 1 _ 7 Q)(3;)]| = |E[P(AT) ((/ - T^)Q) (y)]\ < e. 

Let T be the Markov operator defined by Tg(x) = ~E[g(Y)\X = x], where (X, Y) are distributed 
according to (X, y). In order to prove the lemma it suffices to show that 

\E[P(X)(T(I-T 1 ^)Q)(X)]\<e. (40) 

Write P and Q in terms of their Efron-Stein decomposition, that is, 

P= £ P S , Q= £ P Q . 

Sc[n] Sc[n] 

It is easy to see that 

(/-T 1 _ 7 )Q 5 = (1-(1- 7 ) |S| )Q5, 
and propositions 12.111 and 12.121 imply that 

\\TQsh<P lSl \\Qsh, 

and that TQ S is orthogonal to P s > for S' / S. Writing T' = T(I - Ti_ 7 ) we conclude that 

\\T'Qsh < mm(pW,l - (1 - 7 ) |5| )IIQ5|| 2 < e||Q s || 2j 

and that T'Q$ is orthogonal to P' s when S' ^ S. 
By Cauchy-Schwarz we get that 



\B[P(X)(T'Q)(X)}\ = I nPsT'Qs)] < VV^IP] J2 W T 'Qs\\l < e^V^P)^V^\Q], 

as needed. 
□ 

Similarly we have: 
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Lemma 6.2 Let {o!f\. . . , ^H ] ) k j=1 be k collections of finite probability spaces. Let ((A/-)" =1 : j — 
1, . . . , k) be k ensembles of collections of random variables such that ((A^)j =1 )™ =1 are independent 
and X\ is a basis for the functions in Op . Suppose further that for all i it holds that p{^P '■ 1 < 
3 < k) < p- 

Let Pi, . . . ,Pk be k multi-linear polynomials. Let 7 be chosen sufficiently close to so that 

^ < l _ (l _ e )l°g/V(loge+log/>) 

Then: 



/n particular, there exists an absolute constant C such that it suffices to take 

7 log(l/e) 

Proof: The proof follows the proof of the previous lemma. □ 
6.2 Bilinear Gaussian Bounds 

In this section we prove the bilinear stability bound. We repeat the statement of Theorem 11.141 
with more explicit dependency on the influences. 

Theorem 6.3 Let (0^ x 0- 2 \p,) be a sequence of correlated spaces such that for each i the 
minimum Pj probability of any atom in is at least a < 1/2 and such that , ;Pj) < p 

for all i. Then for every e > there exists a r = r(e) < 1/2 that if f : Yl?=i ~~ ^ [0)1] an d 
ff:nr=i^ (2) ^[0,l] satisfy 

max(Inff og(1/r) log{1/o) (/), Inff og(1/r) log{1/o) (g)) < r (41) 

for all i (See Definition \3. 7| for the definition of low- degree influence) then 

r p (E[f},E[g])-e<E[fg] <T p (E[f],E[g]) + e. (42) 

Moreover, there exists an absolute constant C such that one may take 

^ log(l/a)lQg(l/6) 

' (43) 



Proof: Write p = E[/] and v = E[<?] and K = log(l/a). As discussed in Section I3T21 let X be the 
sequence of ensembles such that Xi spans the functions on J7j = D,^ x Fl[ 2 \ Xi spans functions 
on SlV and y t spans the functions on . We now express / and g as multilinear polynomials P 
and Q of X and y. Let 7 > be chosen so that 

\B[P(X)Q(y)]-B[T 1 ^(X)T 1 ^Q(y)}\ < e/2. 

Note that by Lemma 16. II it follows that we may take 7 = G(e(l — p)/log(l/e)). Thus it suffices to 
prove the bound stated in the theorem for Ti_ 7 P(Af) and Ti_ 7 Q(y). 
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We use the invariance principle under hypothesis H3. Let Q and H, be two Gaussian ensembles 
such that for all i the covariance matrix of T~ii and Qi is identical to the covariance matrix of Xi 
and and such that (Gi,"Hi) are independent. Clearly: 

E[Ti- 7 P(AT)Ti_ 7 Q(y)] = E[T 1 _ 7 P(0)Ti_ 7 Q('W)]. 

Since (P(X), Q(y)) takes values in [0, l] 2 the same is true for 

{P,Q) = {{T l ^P){X),T 1 _ 1 Q{y)). 

In other words, E[£(P, Q)] = 0, where Q is the function in (|34p . Writing 

(P,Q) = ((Ti_ T P)(0) J Ti_ 7 Q(M)), 

we conclude from TheoremSJthatE[C(P,Q))] < t^t/ a '). That is, ||(P, Q) - (P, < t q ^/ k \ 

where P" is the function of P defined by 

r ifP<0, 
P'=\ P ifP€[0,l], , 
[ 1 ifP>l. 

and Q' is defined similarly. Now using Cauchy-Schwarz it is easy to see that 

|e[pq]-e[pV]| <t qmk \ 

Write fjf = E [P'] and v 1 = E [Q'\ . Then using the Gaussian Corollary 15.11 we obtain that 

e[p'q'] < r,(//y). 

From Cauchy-Schwarz it follows that |/x — //| < t^w/ k ) and similarly for z/, z/. It is immediate 
to check that 

|r„( M , v) - !>(//, z/)| < \ii - A + ^ - v'\ < r n ^ K \ 

Thus we have 

E[PQ]<T p (f,,u) + r Q ^ + e/2. 

Taking r as in flj5J) yields 

T Q( 7 /^) < e/2> 

and thus we obtain the upper bound in (I42p . The proof of the lower bound is identical. □ 
The following proposition completes the proof of (|12p . 

Proposition 6.4 Lei (]^[* =1 , F*j) , 1 < i < n be a sequence of correlated spaces such that 
for each i the minimum Pj probability of any atom in Qi is at least a < 1/2 and such that 
p(nf\...,nf ) ) < p<\ for alii. 

Then for every e > there exists a r > such that if fj : Y12=i &i ~~ ^ [0> 1] f or 1 — J ' — k 
satisfy 

max(Inff° g(1/T)/log(1/a) (/,)) <r 
for all i and j (See Definition \3.7\ for the definition of low- degree influence) then it holds that 

k 

r £ (E[/ x ], . . . , E[/ fc ]) - e < E[JJ fi] < r £ (E[/i], . . . ,E[/ fc ]) + e 

i=l 

There exists an absolute constant C such that one may take 

r , log(l/a)log(l/<Q 

t = 6 (44) 
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The proof uses the following lemma, see e.g. [30j. 
Lemma 6.5 Let /i, ...,/&: fi n —► [0, 1]. Then for all j: 

k k 

i=l i=l 
Proof: The proof is based on the fact that 

VarfQ/i] = ^[{\{h{X) ~\{h(Y))\ 

i=l i=l i=l 

where X and Y are independent. Now 

K A/ 

E[(II fi(X) - I] /,(y)) 2 ] < E[(£ \fi(X) - fi{Y)\) 2 ] < k £ E[(/,(X) - /.(Y)) 2 ] = 2^ Inf,(/) 

i=l i=l i=l i=l i=l 

which gives the desired result. □ 

Proof: [Of Proposition 16.4] The proof follows by applying Theorem 16.31 iterativelv for the functions 
fi, fif'2, f 1/2/3, ■ ■ ■ and using the previous lemma and the fact that T p () and TJ) have Lipchitz 
constant 1 in each coordinate. □ 

6.3 Multi-linear bounds 

Next we prove (fT4l) . 

Theorem 6.6 Let (Yij=i j P«)> I < i < n be a sequence of correlated spaces such that for each i 
the minimum Pj probability of any atom in Oj is at least a < 1/2 and such that p(il^ , . . . , offi) < 
p < 1 /or a// i. Suppose furthermore that for all i and j 7^ j' , it holds that p{&!f\&f ^) = 0, i.e., 
(rij=i j P») is pairwise independent. 

Then for every e > i/iere eziste a r = r(e) < 1/2 suc/i £/iai i/ /j : ni=i ^j"^ — * [0, 1] /or 
1 < j ' < & satisfy 

max(Inff log(1/T)/log(1/a) (/,•)) <r (45) 

iE[n/d-n E ^i^ e - ( 4e ) 

i=l i=l 

There exists an absolute constant C such that one may take 

r Klo S (l/e) 

T = e ^~»-> (47) 

Proof: Note that for all i and all 7 we have that the functions fi and Ti_ 7 /j are [0, 1] valued 
functions. Therefore, as in the previous proof we obtain by Lemma 16.21 that 

k k 

\nX\f^-n\{Ti^m<e/A. 

i=l i=l 

for 7 = f2(e(l — p)/ log(l/e)). Thus it suffices to prove the bound stated in the theorem for the 
functions Ti_ 7 /j. 
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We now use the invariance principle. Recall that the /, may be written as a multi-linear 
polynomial Pj of an ensemble Xi. Let Qi,l < i < denote Gaussian ensembles with the same 
covariances as A'j, 1 < j < k and let (g%, ■ ■ ■ , gk) be the multi-linear polynomials (Pi, . . . , P&) applied 
to (Afi, . . . , Affc). Let hi = /r 0) i](Ti_ 7 <7j). By the invariance principle Corollary 14.31 we have: 

k k 

\E[Y[T 1 ^f i ]--E[l[h i }\ <e/4. 
i=l i=l 

Note that ^ are functions of ensembles of Gaussian random variables such that each pair of en- 
sembles is independent. Therefore, the h^s are independent which implies in turn that 

k k 

E[n^=ri E N- 

1=1 i=l 

Corollary 14.31 also implies that 

\E[hi) - E[/f]| = \E[hi] - E[ra_ 7 /i]| < e/4k, 

so 

fe fc 

|JjEN-nE[/i]|<e/4, 

i=l i=l 

which concludes the proof. □ 

6.4 Relaxed influence conditions 

In this subsection we will relax the conditions imposed on the influence, i.e. Proposition 11.151 
In particular we will show that in Theorem 11.141 and k = 2 it suffices to assume that for each 
coordinate at least one of the functions has low influence. Similarly for k > 2 and s-wise independent 
distributions it suffices to have that in each coordinate at most s of the functions have large influence. 

Lemma 6.7 Assume [i is an r-wise independent distribution on Let fi,---,fk '■ ^™ — * [0, 1]. 
Let S C [n] be a set of coordinates such that for each i G S at most r of the functions fj have 
Infj(/j) > e. Define 

gi(x) = B[fi(Y)\Y [n] \ s = x [n] \ s }. 

Then the functions gi do not depend on the coordinates in S and 

k k 
lEin^-EfQ^I <k\S\V~e. 

i=l i=l 

Proof: Recall that averaging over a subset of the variables preserves expected value. It also 
maintains the property of taking values in [0, 1] and decreases influences. Thus it suffices to prove 
the claim for the case where |5| = 1. The general case then follows by induction. 

So assume without loss of generality that S = {1} consists of the first coordinate only and that 
fr+i, ■ ■ ■ , fk a ll have Infi(/j) < e, so that E[(/j — gj) 2 ] < e for j > r. Then by Cauchy-Schwarz 
we have E[|/j — gA] < for j > r and using the fact that the functions are bounded in [0, 1] we 
obtain 

k r k k k k 

II / n 9i\\<n\ n fi- II ^n< E n\fi-9i\]<*>vt- m 

i=l i=l i=r+l i=r+l i=r+l i=r+l 
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Let us write Ei for the expected value with respect to the first variable. Recalling that the g\ do 
not depend on the first variable and that the are r-wise independent we obtain that 

r k k r k r r k k 

Eitn/i n 9i\] = n 9^i[um= n ^n^^n^ n ^=n^ 

i=l i=r+l i—r+1 i=l i=r+l i=l i=l i=r+l i=l 

This implies that 

r k k 

E[II/i II 9i\}=nll9i}, (49) 

i=l i=r+l i=l 

and the proof follows from ([15]) and (fi9]) . □ 

We now prove that condition (fT5j) suffices instead of ([TT]) . 



Lemma 6.8 Theorem\673[ holds with the condition 

max(min(Inff og2(1/r)/log(1/a) (/ 1 ),Inff 1 °g 2 (V-)/ l°g(i/«) (/2))) < T (5Q) 



instead of (Jlty - 
Proof: Let 



and 



, _ (l-p)e 
7 'logCl/e)' 

r log(l/q) log(l/e) 

r = e 2 tw^ 



fl log(l/r) 



log(l/a) ' 

From the proof of Theorem 16.31 it follows that the constant C\ and C2 may be chosen so that 

(1- 7 )«< £ / 4j 

\E[fg]-E[T 1 ^fT 1 ^g]\<e/4. 

Moreover the conclusion of Theorem 16.31 holds with error at most e/4 for any pair of functions / 
and g if all influences of / and g satisfy In£f R < r. Let 

■ log 2 (l/r) 
log(l/a) 

and choose C3 large enough so that 

r , iog(i/c)io g (i/ g ) 
T > = e °3 (1 _ P)E 5 

satisfy 

Assume that / and 5 satisfy 

max(min(Inf^ (f),Inlf R (<?))) < t 7 . 
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We will show that the statement of the theorem holds for / and g. For this let 

S f = {i: InfP(/) > t}, S g = {i: lnif R (g) > r}. 

Let S = Sf U S g . Since R' > R and r' < r, the sets 5/ and S g are disjoint. Moreover, both Sf 
and 5 9 are of size at most — . Also, if i € S and Inff R (f) > r then Inf^ (g) < r 1 and therefore 
Infj(g) < r' + (l — "f) 2R ' ■ In other words, for all i G S 1 we have min(Infi(<?), Inf i(f)) < r' + (l — r )) 2R ' ■ 
Letting S = Sf U S g and applying Lemma 16.71 with 

g'(y') = E\g(Y)\Y [n] \ s = y' [n]KS ], f'(x') = E[f(X)\X [n] \ s = x' [n]V? ], 

we obtain that 

\B[f(x)g(y)] - B[f'(x)g'(y)]\ < ^^r' + (l- 7 P' < J. (51) 

Note that the functions /' and g' satisfy that max(Inf^(/ / ), Inff R (g')) < r for all i. This implies 
that the results of Theorem 16.31 hold for /' and g'. This together with (|5ip implies the desired 
result. □ 

The proof of the relaxed condition on influences in Theorem 16.61 is similar. 

Lemma 6.9 Assume the setup of Theorem 1 6. 61 where (I^jLi J^ , P$) * s s - w ^ se independent for all 
i. Then the conclusion of the theorem holds when the following condition: 

Vi, \{j : i Q ff^ 2 (i/r)/io g (i/ a){fj) > r} | < s (52) 

replaces condition |^5[ j. 

Proof: Again we start by looking at Ti_ 7 /j where 7 = S7(e(l — p)/log(l/e)). We let r' and i?' be 
chosen so that 

k ^T> + (i-iy R '<l. 

The set S 1 will consist of all coordinates j where at least one of the functions fi has Inf f R (fi) > T - 
The rest of the proof is similar. □ 

6.5 A Recursive Argument 

Here we show how Pr op osit ion 1 1 . 1 61 follows from Theorem ll.141 The proof uses the following lemma. 

Lemma 6.10 Let (Oi, /ii), . . . , (O n , be finite probability spaces such that for all i the minimum 
probability of any atom in is at least a < 1/2. Let f : YYi=i ^« — * [0, 1] and suppose that 
Infj(/) > r. Let S = {«}. Then 

nl S ] > E[/] + oct, E[/ 5 ] < E[/] - ar. 
Proof: Note that if g : Oj —>■ [0, 1] satisfies Var[p] > a then 

(max g(x) — E[g]) 2 > aa, (mmg(x) — E[^]) 2 > act. 

X X 

Therefore: 

E[/ 5 - /] > B[(f S - ff] > alnU(f) = ar, 
which implies the first inequality. The second inequality is proved similarly. □ 
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We now prove Proposition 11.161 
Proof: The set T is defined recursively as follows. We let T Q = 0. a t = at = 0. Then we repeat 
the following: If all of the functions 

-jTt -jT t „T t sp t 

J 1 1 ' ' ' 3 J k 3 J_Y J • • • 5 J_fe 

have influences lower than r, then we halt and let T = Tf. Otherwise, there exists at least one i 
and one j such that either Infjf/T') > r or Infj(/ r *) > r. We then let Tt+i = Tt U H\. In the first 
case we let ~a~t+i = at + 1, a 4+1 = a t . In the second case we let at+i = at,a t+ i = a t + 1. 
Note that by Lemma 16.101 this process must terminate within 

2k 
ar 

steps since 

k k 

k>B[^2 J?} > arat, < E[^ f_f] < k - ara t . 

3=1 3=1 

and at + a t > t. □ 

7 Applications to Social Choice 

In this section we apply Theorem 16.21 to the two social choice models. 

7.1 p for samples of votes 

In the first social choice example we consider Example 12.61 The correlated probability spaces are 
the ones given by 

n v = {{x = i},{ x = -i}}, 

representing the intended vote and 

Q s = {{(x = l,y= 1)},{(x = -l,y= l)},{y = 0}} 

representing the sampled status. 

In order to calculate p(Qi, Q2) h suffices by lemma [2T8l to calculate \J E[(T f) 2 ] where f(x, y) = x 
is the (only) measurable with E[/] = and E[/ 2 ] = 1. We see that Tf(x,y) = if y = and 
Tf(x, y) = x when y ^ 0. Therefore 

v / E[(T/)2]=p 1 /2. 

Lemma 7.1 

P (n v ,n s )=P 1/2 . 

7.2 Predictability of Binary Vote 

Here we prove Theorem II .11 

Proof: The proof follows directly from Proposition 11.151 and Lemma |7. II as 

1^(1/2,1/2) = I + i- arcsin^p. 

□ 
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7.3 p in Condorcet voting 

In the context of Condorcet voting, f2 is given by Sk, the group of permutations on k elements and 
p is the uniform measure. We write ) for the collection of subsets of [k] of size 2. 

Definition 7.2 Let Q C (^) a^d de/me Rq : ft ^ {0, 1} Q 6e Zettm# {RQ{o))i<j = 1 «/cr(i) < 
and (Rq(a))i < j = ifcr(i) > o~(j). Rq summarizes the pairwise relations in the permutation a for 
pairs in Q. 

Given a subset Q C (^') we define 

U Q = {{a : R(a) = x} : x e {0,1} Q } 
Thus £Iq is the coarsening of £1 summarizing the information about pairwise relations in Q. 

We will mostly be interested in p(Qq, &i<j) where (i < j) ^ Q. 
Lemma 7.3 Suppose Q = {(1 > 2), (1 > 3), . . . , (1 > r)} then 



P(P,Q, ^l>(l +r )) 



1 1 

< 



3(r + l) " ^3 



Proof: We use lemma [2TS1 again. The space ^i>( r +i) has a unique function with E[/] = and 
E[/ 2 ] = 1. This is the function that satisfies f(a) = 1 when cr(l) > o~(r + 1) and f(a) = —1 if 
cr(l) < cr(r + 1). 

The conditional probability that o~(l) > o~(r + 1) given that s of the inequalities c(l) > 
o-(2),...,a(l) > o-(r) hold is 

s + 1 
r + 1' 

Therefore the conditional expectation of / under this conditioning is: 

(s + 1) - (r - s) _ 2s + 1 - r 
r + 1 r + 1 

Noting that the number of inequalities satisfied is uniform in the range {0, . . . , r — 1} we see that 

n(Tff] = J> + l-r) 3 = ^ 

1 / 2(r - l) 3 + 3(r - l) 2 + (r - 1) (r - l) 2 + (r - 1 ) , ^ ^ 



Therefore: 



□ 



, , . 4(r-l)- -+r(r-l) 

r(r + l) 2 V 6 V ; 2 V ; 

r — 1 
3(r + l)' 



l r -l 
3(r + l) 
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7.4 Condorcet Paradox 

We now prove Theorem 11.51 dealing with Condorcet paradoxes. 
Proof: 

We wish to bound the asymptotic probability in terms of the number of candidates k for the 
probability that there is a unique maximum in Condorcet aggregation. Clearly this probability 
equals k times the probability that candidate 1 is the maximum. 

Recall that the votes are aggregated as follows. Let / : { — 1, 1}™ — * {—1, 1} be an anti-symmetric 
be a function with E[/] = and low influences. Let a±, . . . ,cr n £ Sk denote the n rankings of the in- 
dividual voters. Recall that we denote x a>b (i) = 1 if <7j(a) > <Ji(b) and x a<b (i) = —1 if <7j(a) < <Ji(b). 
Note that x b>a = —x a>b . We recall further that the binary decision between each pair of coordi- 
nates is performed via a anti-symmetric function / : {—1, 1}™ — ► {—1, 1} so that f(—x) = —f(x) 
for all x £ {—1, l} n - Finally, the tournament Gk = Gk(o", f) is defined by having (a > b) G Gk if 
and only if f(x a>b ) = 1. 

In order to obtain an upper bound on the probability that 1 is the unique maximum, define 
/°'Vi> . . . , a n ) = (1 + f{x a>b ))/2. Then the probability that 1 is the maximum is given by: 

EfQ/H 

Using (fl2j) we obtain that 

Eirp 1 -]^ 

a=2 

where e — > as r — > 0; there are k — 1 expected values all given by 1/2 as E[/ 1,a ] = 1/2 for all a; 
the k — 2 values of p all bounded by l/v3 by Lemma 1731 
By Lemma 15.41 we now obtain 

nflfn < £r 2+o(i) . 

a=2 

Taking the union bound on the k possible maximal values we obtain that in Condorcet voting the 
probability that there is a unique max is at most A;~ 1 +°( 1 ) as needed. □ 

7.5 Majority and Tightness 

The majority function shows the tightness of Theorem 11.11 and Theorem 11.51 

7.5.1 Tightness for Prediction 

For the tightness of Theorem 11.11 write: 

En \--*n 

i=l x i v 2-/i=l x iVi 

A — 7= — j Y — 7= — j 

where x% is the intended vote of voter i and yi = 1 if voter i was queried, m = otherwise. X is 
the total bias of the actual vote and Y the bias of the sample. 



37 



Note that (X, Y) is asymptotically a normal vector with covariance matrix 

Therefore asymptotically we obtain 

E[sgn(X)sgn(Y)] = 2Y>[X > 0,Y > 0] - 1 = 21^(1/2,1/2) - 1 = § arcsin^/p. 

7.5.2 Tightness in Condorcet Voting 

The tightness in Theorem 11.51 follows from [28] and [23] . We briefly sketch the main steps of the 
proof. 

For each a and b let 

V" r a>b 
v a>b _ Z^t=l x t 
A — 7= j 

be the bias preference in a majority vote towards a. By the CLT, all the random variables X a>b 
are asymptotically N(0, 1). 

Consider the random variables X 1>2 , . . . , X 1>k . Note that this set of variables is exchangeable 
(they are identically distributed under any permutation of their order). Moreover, 



B[X 1>2 X 1>3 } = -Y J E[x 1 > 2 x 1 > 3 } = 1/3. 



n 

By the CLT the limiting value (as n to oo) of 

lim P[X 1>2 > 0, . . . , X l>k > 0] = P[iV 1>2 > 0, . . . , N 1>k > 0] 

n>oo 

where (A^ 1>a ) is an exchangeable collection of normal iV(0, 1) random variables, the correlation 
between each pair of which is 1/3. The results of [28J imply that as k — ► oo: 

P[iV 1>2 > 0, . . . , N l>k > 0] ~ 2r(2)v / 2^^^, 

This in turn implies that the probability of a unique max for majority voting for large k as n — > oo 
is given by: 

(2 + o(l))r(2)v^^^, 
showing the tightness of the result up to sub-polynomial terms. 

7.5.3 The probability that majority will result in linear order 

Here we prove Proposition 11.61 and show that the probability that majority will result in a linear 
order is exp(— G(/c 5//3 )). We find this asymptotic behavior quite surprising. Indeed, given the 
previous results that the probability that there is a unique max is k~ 1+ °^\ one may expect that 
the probability that the order is linear would be 



k- 1+ °0-)(k - i)- 1+o w . . . = (k\)- 1+0 ^. 



However, it turns out that there is a strong negative correlation between the event that there is a 
unique maximum among the k candidates and that among the other candidates there is a unique 
max. 
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Proof: We use the multi-dimensional CLT. Let 



X a>b = -L (\W ■ <r(a) > <r(b)}\ - \{a : a(b) > a(a)}\) 
\/n 



By the CLT at the limit the collection of variables (X a>b ) a ^ b converges to a joint Gaussian vector 
{^a>b)a^b satisfying for all distinct a, b, c, d: 

N a>b = -N b>a , Cov[iV a>fc , iV a>c ] = Cov[iV a> 6, N c>d ] = 0. 

and N a>b ~ N(0, 1) for all a and b. 

We are interested in providing bounds on 

P[Va > b : N a>b > 0] 

as the probability that the resulting tournament is an order is obtained by multiplying by a k\ = 
exp(0(/c log k)) factor. 

We claim that there exist independent N(0, 1) random variables X a for 1 < a < k and Z a>b for 
1 < a ^ b < k such that 

1 

N a >b = ~^^ a ~ Xb + Z a >b) 

(where Z a>b = —Z b>a ). This follows from the fact that the joint distribution of Gaussian random 
variables is determined by the covariance matrix (this is noted in the literature in |23j). 

We now prove the upper bound. Let a be a constant to be chosen later. Note that for all a 
and large enough k it holds that: 

P[|X tt | > k a ] < exp(-0(/t 2a )). 

Therefore the probability that for at least half of the a's in the interval [k/2, k] it holds that 
\X a \ > k a is at most 

exp(-9(A: 1+2a )). 

Let's assume that at least half of the a's in the interval [k/2, k] satisfy that \X a \ < k a . We 
claim that in this case the number H k / 4 [— k a , k a ] of pairs a > b such that X a ,X b G —[k a ,k a ] and 
X a -X b < 1 is n(k 2 ~ a ). 

For the last claim partition the interval [— k a , k a ] into sub- intervals of length 1 and note that at 
least n(k) of the points belong to sub-intervals which contain at least f^/c 1 ^) points. This implies 
that the number of pairs a > b satisfying \X a — X b \ < 1 is Q(k 2 ~ a ). 

Note that for such pair a > b in order that N a>b > we need that Z a>b > — 1 which happens 
with constant probability. 

We conclude that given that half of the X's fall in [— k a , k a ] the probability of a linear order is 
bounded by 

exp(-n{k 2 - a )). 
Thus overall we have bounded the probability by 

exp(-n{k 1+2a )) +exp(-n{k 2 - a )). 

The optimal exponent is a = 1/3 giving the desired upper bound. 

For the lower bound we condition on X a taking value in (a, a + l)k~~ 2 / 3 . Each probability is at 
least exp(— 0(k 2 / 3 )) and therefore the probability that all X a take such values is 



cxp 



(-0(k 5 / 3 )). 
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Moreover, conditioned on X a taking such values the probability that 

Z a >b > Xb — X a , 

for all a > b is at least 

/k-i \ k / oo \ fc5/3 

This proves the required result. □ 



8 Applications to Hyper-Graphs and Additive Properties 

In this section we prove Theorem 11.71 and give a few examples. The basic idea in the applications 
presented so far was that in order to bound correlation between k events of low influences, it suffices 
to know how to bound the correlation between the first k — 1 and the last one. For low influence 
events, using the invariance principle, one obtains bounds coming from half spaces in Gaussian 
space, or majority functions in the discrete space. 

The applications presented now will be of different nature. We will be interested again in 
correlation between k events - however, we will restrict to correlation measures defined in such 
a way that any pair of events are un-correlated. While this is a much more restrictive setting, 
it allows one to obtain exact results and not just bounds. In other words, we obtain that such 
correlation measures for low-influence events depend only on the measure of the sets but not on 
any additional structure. While this may sound surprising, it in fact follows directly the invariance 
principle together with the fact that for jointly Gaussian random variables, pairwise independence 
implies independence. We first prove Theorem 11.71 

Proof: The proof follow immediately from Theorem 11.141 Proposition 11.161 and Lemma 12.91 □. 

Example 8.1 Consider the group Z m for m ^> 2. We will be interested in linear relations over 
Z^. A linear relation over Z^ is given by L = (Lq, t\, . . . , 1^) such that £{ ^ for all i > 1 and 1% 
and m are co-prime for all i > 1. We will restrict to the case k > 3. We will write L(x) to denote 
the logical statement that Y2i=i mod m G Lq. Given a set A C Z^ we will denote 

L(A k ) = \{xeA k :L(x)}\, 

and hl the uniform measure on L(A k ). We note that for every linear relation we have that is 
pairwise smooth and that if the set Lq is of size at least 2 then R is connected. 

We now apply Theorem to conclude that for low influence sets A C the number of k 
tuples (xi, . . . , Xf.) G A k satisfying ^ xi mod m G Lq is 

(l±o(l))\A\ k (^) n . (53) 

For general sets A we conclude that we have A^ C A C A T where \S\ = 0(1), T = 0(1) and $53\) 
holds for both A and A . 

Example 8.2 We may consider much more general relations. For example we may take P be a 
polynomial in x\, . . . , x r G Z r and Q to be a polynomial in x r +i, ... ,x^ such that P and Q both 
have roots. Then we can look at the relation defined by the zeros of PQ. It is easy to check that 
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R is connected. Therefore it follows that if A C is set all of whose influences are lower than r 
then 

\ £ 'm\ \ £, m\ 

where c\ and C2 are two positive functions. Again if A is not of low influences then there exist finite 
sets of coordinates S and T such that A s C A C A and the conclusion holds for S and T. 
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